Deadlock

A deadlock is a situation wherein two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg". The concept of a Catch-22 is similar.

In computer science, Coffman deadlock refers to a specific condition when two or more processes are each waiting for the other to release a resource, or more than two processes are waiting for resources in a circular chain (see Necessary conditions). Deadlock is a common problem in multiprocessing where many processes share a specific type of mutually exclusive resource known as a software lock or soft lock. Computers intended for the time-sharing and/or real-time markets are often equipped with a hardware lock (or hard lock) which guarantees exclusive access to processes, forcing serialized access. Deadlocks are particularly troubling because there is no general solution to avoid (soft) deadlocks.

This situation may be likened to two people who are drawing diagrams, with only one pencil and one ruler between them. If one person takes the pencil and the other takes the ruler, a deadlock occurs when the person with the pencil needs the ruler and the person with the ruler needs the pencil to finish his work with the ruler. Neither request can be satisfied, so a deadlock occurs.

The telecommunications description of deadlock is weaker than Coffman deadlock because processes can wait for messages instead of resources. A deadlock can be the result of corrupted messages or signals rather than merely waiting for resources. For example, a dataflow element that has been directed to receive input on the wrong link will never proceed even though that link is not involved in a Coffman cycle.

Contents

Examples

When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone.
 
— Illogical statute passed by the Kansas Legislature[1]

An example of a deadlock which may occur in database products is the following. Client applications using the database may require exclusive access to a table, and in order to gain exclusive access they ask for a lock. If one client application holds a lock on a table and attempts to obtain the lock on a second table that is already held by a second client application, this may lead to deadlock if the second application then attempts to obtain the lock that is held by the first application. (This particular type of deadlock could be prevented, by using an all-or-none resource allocation algorithm.)

Necessary conditions

There are four necessary conditions for a Coffman deadlock to occur, known as the Coffman conditions from their first description in a 1971 article by Edward G. Coffman, Jr.:

  1. Mutual Exclusion: a resource that cannot be used by more than one process at a time
  2. Hold and Wait: processes already holding resources may request new resources held by other processes
  3. No Preemption: No resource can be forcibly removed from a process holding it, resources can be released only by the explicit action of the process.
  4. Circular Wait: two or more processes form a circular chain where each process waits for a resource that the next process in the chain holds. When circular waiting is triggered by mutual exclusion operations it is sometimes called lock inversion.[2]

Unfulfillment of any of these conditions is enough to preclude Coffman deadlock from ever occurring. However, since the conditions are not sufficient, their mere presence does not itself imply a deadlock. Deadlock is computer hardware dependent.

Prevention

Avoidance

Deadlock can be avoided if certain information about processes are available in advance of resource allocation. For every resource request, the system sees if granting the request will mean that the system will enter an unsafe state, meaning a state that could result in deadlock. The system then only grants requests that will lead to safe states. In order for the system to be able to determine whether the next state will be safe or unsafe, it must know in advance at any time the number and type of all resources in existence, available, and requested. One known algorithm that is used for deadlock avoidance is the Banker's algorithm, which requires resource usage limit to be known in advance. However, for many systems it is impossible to know in advance what every process will request. This means that deadlock avoidance is often impossible.

Two other algorithms are Wait/Die and Wound/Wait, each of which uses a symmetry-breaking technique. In both these algorithms there exists an older process (O) and a younger process (Y). Process age can be determined by a timestamp at process creation time. Smaller time stamps are older processes, while larger timestamps represent younger processes.

Wait/Die Wound/Wait
O needs a resource held by Y O waits Y dies
Y needs a resource held by O Y dies Y waits

It is important to note that a process may be in an unsafe state but would not result in a deadlock. The notion of safe/unsafe states only refers to the ability of the system to enter a deadlock state or not. For example, if a process requests A which would result in an unsafe state, but releases B which would prevent circular wait, then the state is unsafe but the system is not in deadlock.

Detection

Often, neither avoidance nor deadlock prevention may be used. Instead, deadlock detection and process restart are used by employing an algorithm that tracks resource allocation and process states, and rolls back and restarts one or more of the processes in order to remove the deadlock. Detecting a deadlock that has already occurred is easily possible since the resources that each process has locked and/or currently requested are known to the resource scheduler or OS.

Detecting the possibility of a deadlock before it occurs is much more difficult and is, in fact, generally undecidable, because the halting problem can be rephrased as a deadlock scenario. However, in specific environments, using specific means of locking resources, deadlock detection may be decidable. In the general case, it is not possible to distinguish between algorithms that are merely waiting for a very unlikely set of circumstances to occur and algorithms that will never finish because of deadlock.

Deadlock detection techniques include, but is not limited to model checking. This approach constructs a finite state-model on which it performs a progress analysis and finds all possible terminal sets in the model. These then each represent a deadlock.

Distributed deadlock

Distributed deadlocks can occur in distributed systems when distributed transactions or concurrency control is being used. Distributed deadlocks can be detected either by constructing a global wait-for graph, from local wait-for graphs at a deadlock detector or by a distributed algorithm like edge chasing.

In a commitment ordering-based distributed environment (including the strong strict two-phase locking (SS2PL, or rigorous) special case) distributed deadlocks are resolved automatically by the atomic commitment protocol (like a two-phase commit (2PC)), and no global wait-for graph or other resolution mechanism is needed. Similar automatic global deadlock resolution occurs also in environments that employ 2PL that is not SS2PL (and typically not CO; see Deadlocks in 2PL). However, 2PL that is not SS2PL is rarely utilized in practice.

Phantom deadlocks are deadlocks that are detected in a distributed system due to system internal delays but no longer actually exist at the time of detection.

Distributed deadlock prevention

Consider the "when two trains approach each other at a crossing" example defined above. Just-in-time prevention works like having a person standing at the crossing (the crossing guard) with a switch that will let only one train onto "super tracks" which runs above and over the other waiting train(s).

So the issue with the first one is that it does no deadlock prevention at all. The second does not do distributed deadlock prevention. But the second one is redefined to prevent a deadlock scenario the first one does not address.

After a super-thread is finished, the condition changes back to using the logic from the recursive lock, and the exiting super-thread

  1. sets itself as not being a super-thread
  2. notifies the locker that other locked, waiting threads need to re-check this condition

If a deadlock scenario exists, set a new super-thread and follow that logic. Otherwise, resume regular locking.

Issues not addressed above

A lot of confusion revolves around the halting problem. But this logic does not solve the halting problem because the conditions in which locking occurs are known, giving a specific solution (instead of the otherwise required general solution that the halting problem requires). Still, this locker prevents all deadlocked only considering locks using this logic. But if it is used with other locking mechanisms, a lock that is started never unlocks (exception thrown jumping out without unlocking, looping indefinitely within a lock, or coding error forgetting to call unlock), deadlocking is very possible. To increase the condition to include these would require solving the halting issue, since one would be dealing with conditions that one knows nothing about and is unable to change.

Another issue is it does not address the temporary deadlocking issue (not really a deadlock, but a performance killer), where two or more threads lock on each other while another unrelated threads is running. These temporary deadlocks could have a thread running exclusively within them, increasing parallelism. But because of how the distributed deadlock detection works for all locks, and not subsets therein, the unrelated running thread must complete before performing the super-thread logic to remove the temporary deadlock.

One can see the temporary live-lock scenario in the above. If another unrelated running thread begins before the first unrelated thread exits, another duration of temporary deadlocking will occur. If this happens continuously (extremely rare), the temporary deadlock can be extended until right before the program exits, when the other unrelated threads are guaranteed to finish (because of the guarantee that one thread will always run to completion).

Further expansion

This can be further expanded to involve additional logic to increase parallelism where temporary deadlocks might otherwise occur. But for each step of adding more logic, we add more overhead.

A couple of examples include: expanding distributed super-thread locking mechanism to consider each subset of existing locks; Wait-For-Graph (WFG) [1] algorithms, which track all cycles that cause deadlocks (including temporary deadlocks); and heuristics algorithms which don't necessarily increase parallelism in 100% of the places that temporary deadlocks are possible, but instead compromise by solving them in enough places that performance/overhead vs parallelism is acceptable (e.g. for each processor available, work towards finding deadlock cycles less than the number of processors + 1 deep).

Livelock

A livelock is similar to a deadlock, except that the states of the processes involved in the livelock constantly change with regard to one another, none progressing.[3] Livelock is a special case of resource starvation; the general definition only states that a specific process is not progressing.[4]

A real-world example of livelock occurs when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side to side without making any progress because they both repeatedly move the same way at the same time.

Livelock is a risk with some algorithms that detect and recover from deadlock. If more than one process takes action, the deadlock detection algorithm can be repeatedly triggered. This can be avoided by ensuring that only one process (chosen randomly or by priority) takes action.[5]

See also

References

  1. ^ A Treasury of Railroad Folklore, B.A. Botkin & A.F. Harlow, p. 381
  2. ^ Silviu Andrica, Cristian Zamfir, George Candea. "GoodRun: Enforcing Good Runs in Parallel Programs". EuroSys 2009. https://dslabpc10.epfl.ch/ssl_read/zamfir/goodrun_wip_eurosys09.pdf. 
  3. ^ Mogul, Jeffrey C.; K. K. Ramakrishnan (1996). "Eliminating receive livelock in an interrupt-driven kernel". http://citeseer.ist.psu.edu/326777.html. 
  4. ^ Anderson, James H.; Yong-jik Kim (2001). "Shared-memory mutual exclusion: Major research trends since 1986". http://citeseer.ist.psu.edu/anderson01sharedmemory.html. 
  5. ^ Zöbel, Dieter (October 1983). "The Deadlock problem: a classifying bibliography". ACM SIGOPS Operating Systems Review 17 (4): 6–15. doi:10.1145/850752.850753. ISSN 0163-5980. 

Further reading

External links